Continuous wavelet transform for analysis of speech prosody
نویسندگان
چکیده
Wavelet based time frequency representations of various signals are shown to reliably represent perceptually relevant patterns at various spatial and temporal scales in a noise robust way. Here we present a wavelet based visualization and analysis tool for prosodic patterns, in particular intonation. The suitability of the method is assessed by comparing its predictions for word prominences against manual labels in a corpus of 900 sentences. In addition, the method’s potential for visualization is demonstrated by a few example sentences which are compared to more traditional visualization methods. Finally, some further applications are suggested and the limitations of the method are discussed.
منابع مشابه
Mapping areal variation and majority language influence in North Sámi using hierarchical prosodic analysis
This paper presents the results of a statistical hierarchical analysis of areal variation in prosody of North Sámi language. The hierarchical analysis method compares unigram models using cross-entropy measure. The models depict distributions of Δ-features of f0 and energy signals decomposed using Continuous Wavelet Transform [1]. These signals are obtained from speech recordings of five areal ...
متن کاملHierarchical Representation of Prosody for Statistical Speech Synthesis
Prominences and boundaries are the essential constituents of prosodic structure in speech. They provide for means to chunk the speech stream into linguistically relevant units by providing them with relative saliences and demarcating them within coherent utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in textto-speech synt...
متن کاملDeep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
Emotional voice conversion aims at converting speech from one emotion state to another. This paper proposes to model timbre and prosody features using a deep bidirectional long shortterm memory (DBLSTM) for emotional voice conversion. A continuous wavelet transform (CWT) representation of fundamental frequency (F0) and energy contour are used for prosody modeling. Specifically, we use CWT to de...
متن کاملEmotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data
Deep learning techniques have been successfully applied to speech processing. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC), which represent the spectrum features in voice conversion (VC) tasks. Despite these successes, the approach is restricted to problems with moderate dimension and sufficient data. Thus, in emot...
متن کاملEmotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstral Coefficients (MCC), which represent the spectrum features. However, a simple representation of fundamental frequency (F0) is not enough for NNs to deal with emotional voice VC. This is ...
متن کامل